19 research outputs found
Variational Inference of Joint Models using Multivariate Gaussian Convolution Processes
We present a non-parametric prognostic framework for individualized event
prediction based on joint modeling of both longitudinal and time-to-event data.
Our approach exploits a multivariate Gaussian convolution process (MGCP) to
model the evolution of longitudinal signals and a Cox model to map
time-to-event data with longitudinal data modeled through the MGCP. Taking
advantage of the unique structure imposed by convolved processes, we provide a
variational inference framework to simultaneously estimate parameters in the
joint MGCP-Cox model. This significantly reduces computational complexity and
safeguards against model overfitting. Experiments on synthetic and real world
data show that the proposed framework outperforms state-of-the art approaches
built on two-stage inference and strong parametric assumptions
Minimizing Negative Transfer of Knowledge in Multivariate Gaussian Processes: A Scalable and Regularized Approach
Recently there has been an increasing interest in the multivariate Gaussian
process (MGP) which extends the Gaussian process (GP) to deal with multiple
outputs. One approach to construct the MGP and account for non-trivial
commonalities amongst outputs employs a convolution process (CP). The CP is
based on the idea of sharing latent functions across several convolutions.
Despite the elegance of the CP construction, it provides new challenges that
need yet to be tackled. First, even with a moderate number of outputs, model
building is extremely prohibitive due to the huge increase in computational
demands and number of parameters to be estimated. Second, the negative transfer
of knowledge may occur when some outputs do not share commonalities. In this
paper we address these issues. We propose a regularized pairwise modeling
approach for the MGP established using CP. The key feature of our approach is
to distribute the estimation of the full multivariate model into a group of
bivariate GPs which are individually built. Interestingly pairwise modeling
turns out to possess unique characteristics, which allows us to tackle the
challenge of negative transfer through penalizing the latent function that
facilitates information sharing in each bivariate model. Predictions are then
made through combining predictions from the bivariate models within a Bayesian
framework. The proposed method has excellent scalability when the number of
outputs is large and minimizes the negative transfer of knowledge between
uncorrelated outputs. Statistical guarantees for the proposed method are
studied and its advantageous features are demonstrated through numerical
studies
On Negative Transfer and Structure of Latent Functions in Multi-output Gaussian Processes
The multi-output Gaussian process () is based on the
assumption that outputs share commonalities, however, if this assumption does
not hold negative transfer will lead to decreased performance relative to
learning outputs independently or in subsets. In this article, we first define
negative transfer in the context of an and then derive
necessary conditions for an model to avoid negative transfer.
Specifically, under the convolution construction, we show that avoiding
negative transfer is mainly dependent on having a sufficient number of latent
functions regardless of the flexibility of the kernel or inference
procedure used. However, a slight increase in leads to a large increase in
the number of parameters to be estimated. To this end, we propose two latent
structures that scale to arbitrarily large datasets, can avoid negative
transfer and allow any kernel or sparse approximations to be used within. These
structures also allow regularization which can provide consistent and automatic
selection of related outputs
The Renyi Gaussian Process: Towards Improved Generalization
We introduce an alternative closed form lower bound on the Gaussian process
() likelihood based on the R\'enyi -divergence. This new
lower bound can be viewed as a convex combination of the Nystr\"om
approximation and the exact . The key advantage of this bound, is
its capability to control and tune the enforced regularization on the model and
thus is a generalization of the traditional variational
regression. From a theoretical perspective, we provide the convergence rate and
risk bound for inference using our proposed approach. Experiments on real data
show that the proposed algorithm may be able to deliver improvement over
several inference methods
Functional Principal Component Analysis for Extrapolating Multi-stream Longitudinal Data
The advance of modern sensor technologies enables collection of multi-stream
longitudinal data where multiple signals from different units are collected in
real-time. In this article, we present a non-parametric approach to predict the
evolution of multi-stream longitudinal data for an in-service unit through
borrowing strength from other historical units. Our approach first decomposes
each stream into a linear combination of eigenfunctions and their corresponding
functional principal component (FPC) scores. A Gaussian process prior for the
FPC scores is then established based on a functional semi-metric that measures
similarities between streams of historical units and the in-service unit.
Finally, an empirical Bayesian updating strategy is derived to update the
established prior using real-time stream data obtained from the in-service
unit. Experiments on synthetic and real world data show that the proposed
framework outperforms state-of-the-art approaches and can effectively account
for heterogeneity as well as achieve high predictive accuracy
Why Non-myopic Bayesian Optimization is Promising and How Far Should We Look-ahead? A Study via Rollout
Lookahead, also known as non-myopic, Bayesian optimization (BO) aims to find
optimal sampling policies through solving a dynamic programming (DP)
formulation that maximizes a long-term reward over a rolling horizon. Though
promising, lookahead BO faces the risk of error propagation through its
increased dependence on a possibly mis-specified model. In this work we focus
on the rollout approximation for solving the intractable DP. We first prove the
improving nature of rollout in tackling lookahead BO and provide a sufficient
condition for the used heuristic to be rollout improving. We then provide both
a theoretical and practical guideline to decide on the rolling horizon
stagewise. This guideline is built on quantifying the negative effect of a
mis-specified model. To illustrate our idea, we provide case studies on both
single and multi-information source BO. Empirical results show the advantageous
properties of our method over several myopic and non-myopic BO algorithms.Comment: 12 pages, 1 figure Accepted by AISTATS 202
Heterogeneous Matrix Factorization: When Features Differ by Datasets
In myriad statistical applications, data are collected from related but
heterogeneous sources. These sources share some commonalities while containing
idiosyncratic characteristics. More specifically, consider the setting where
observation matrices from sources are generated from a
few common and source-specific factors. Is it possible to recover the shared
and source-specific factors? We show that under appropriate conditions on the
alignment of source-specific factors, the problem is well-defined and both
shared and source-specific factors are identifiable under a constrained matrix
factorization objective. To solve this objective, we propose a new class of
matrix factorization algorithms, called Heterogeneous Matrix Factorization. HMF
is easy to implement, enjoys local linear convergence under suitable
assumptions, and is intrinsically distributed. Through a variety of empirical
studies, we showcase the advantageous properties of HMF and its potential
application in feature extraction and change detection
Weakly-supervised Multi-output Regression via Correlated Gaussian Processes
Multi-output regression seeks to infer multiple latent functions using data
from multiple groups/sources while accounting for potential between-group
similarities. In this paper, we consider multi-output regression under a
weakly-supervised setting where a subset of data points from multiple groups
are unlabeled. We use dependent Gaussian processes for multiple outputs
constructed by convolutions with shared latent processes. We introduce
hyperpriors for the multinomial probabilities of the unobserved labels and
optimize the hyperparameters which we show improves estimation. We derive two
variational bounds: (i) a modified variational bound for fast and stable
convergence in model inference, (ii) a scalable variational bound that is
amenable to stochastic optimization. We use experiments on synthetic and
real-world data to show that the proposed model outperforms state-of-the-art
models with more accurate estimation of multiple latent functions and
unobserved labels
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization
In an effort to improve generalization in deep learning and automate the
process of learning rate scheduling, we propose SALR: a sharpness-aware
learning rate update technique designed to recover flat minimizers. Our method
dynamically updates the learning rate of gradient-based optimizers based on the
local sharpness of the loss function. This allows optimizers to automatically
increase learning rates at sharp valleys to increase the chance of escaping
them. We demonstrate the effectiveness of SALR when adopted by various
algorithms over a broad range of networks. Our experiments indicate that SALR
improves generalization, converges faster, and drives solutions to
significantly flatter regions
Personalized Dictionary Learning for Heterogeneous Datasets
We introduce a relevant yet challenging problem named Personalized Dictionary
Learning (PerDL), where the goal is to learn sparse linear representations from
heterogeneous datasets that share some commonality. In PerDL, we model each
dataset's shared and unique features as global and local dictionaries.
Challenges for PerDL not only are inherited from classical dictionary learning
(DL), but also arise due to the unknown nature of the shared and unique
features. In this paper, we rigorously formulate this problem and provide
conditions under which the global and local dictionaries can be provably
disentangled. Under these conditions, we provide a meta-algorithm called
Personalized Matching and Averaging (PerMA) that can recover both global and
local dictionaries from heterogeneous datasets. PerMA is highly efficient; it
converges to the ground truth at a linear rate under suitable conditions.
Moreover, it automatically borrows strength from strong learners to improve the
prediction of weak learners. As a general framework for extracting global and
local dictionaries, we show the application of PerDL in different learning
tasks, such as training with imbalanced datasets and video surveillance